Visualization and Analysis of Tourism Income, Exchange Rate and Accommodation Price Index

1 Introduction

Time series data is defined as the sequence of data collected at adjacent time periods which are ordered chronologially. When a dynamic ecosystem is being examined, it is important to understand the relations between different datasets through some analysis and visualization methods.

Within the frame of this project, three main datasets are collected from Central Bank of the Republic of Turkey website: Foreign Visitors Travel Incomes (million USD), US Dollar (Buying) Exchange Rate and Consumer Price Index of Accommodation Services between 2016-2021. Before moving forward with the analysis, it is beneficial to make sure that the research question is parallel with the solid scientific research carried out in the past.

Turkey is ranked as 6th in International Tourist Arrivals by Country of Destination and as 14th in International Tourism Receipts with the income of 29.8 USD billion according to UNWTO World Tourism Organization. [1] It is impossible to ignore the importance of tourism for Turkey, approximately covering %15 of the general income, increasing employment opportunities and eliminating foreign trade deficits. According to many researches and insights of tourism workers, tourism revenues are affected by exchange rates due to the fact that %73 of the tourism income is supplied by foreign tourists (TUIK). Since the goods and services gets cheaper for them as the exchange rates go high, increase in demand and eventually tourism income would be expected in return.[2]

To summarize, the research question of this study is: Is there any correlation between accommodation price index, exhange rate and eventually foreign tourist income? This is a question with a wide perspective which requires further analysis and a general understanding will try to be achieved by using data visualization methods.

2 Analysis

2.1 Manipulation of the Dataset

The datasets retrieved for this project is from the EVDS system, between December 2016 and January 2021 monthly. 72 data points are achieved in return, the data is merged and manipulated in the playground provided by R.

In addition, the data is checked numerically before going into further analysis and it is realized that there is 3 missing data points in Foreign Income dataset, which are April, May and June 2020. These are the months which pandemic started and it is assumed that both tourism and data collection process is interrupted due to pandemic’s restrictions. However instead of replacing these cells with 0, a naive approach is used in this scenario and the data is filled by the values of past year’s same months. Effects of pandemic is not lost, since its reflection can be seen in the posterior months when we numerically take a look on the dataset .

setwd('/Users/larahos/Desktop')
ForeignIncome=read.xlsx('Foreign-Income.xlsx')
AccommodationInd=read.xlsx('AccommodationInd.xlsx')
DollarBuy=read.xlsx('DollarBuy.xlsx')

#no values are assigned to April, May, June 2020 data:so we will use a naive approach and implement 
#the past years values to these months
ForeignIncome[52:54, "INCOME"] <- ForeignIncome[40:42, "INCOME"]

#we merged all dataset by using R 
joined_dt1 <- merge(ForeignIncome, AccommodationInd, 
                   by = 'Date')
data <- merge(joined_dt1,DollarBuy, by = 'Date')
colnames(data) <- c("Date", "Income","Accommodation_Index", "Dollar_Exchange_Rate")

#we will change all dates to year-month format which will be useful when we create a time series object
data$Date <- as.yearmon(x = data$Date)

#we created a time series object and we will check the numerical summary of the data since time bheaves as an Index rather than an Input in our case
data_ts <- xts(x = data[-1],order.by = data$Date,frequency = 12)

2.2 Numerical Analysis of the Dataset

The dataset which will be examined through the research is firstly be analyzed in the numeric sense at the beginning. At the first step, the summary of the time series object is printed. It is seen that mean and median values of Accommodation Index and Exchange Rate is so close, which can be concluded that they are likely to show a symmetric distribution in the following steps in contrast to the behavior of Income. The minimum value of income is extremely small, which may be an outcome of pandemics effect. Also there are some signs of seasonal patterns in head and tails of the Income dataset, which is related to the summer tourism in Turkey.

Furthermore, it is seen that the increase rate gained a positive acceleration for the last months of 2021. This fast increment is probably not reflected on the price index and foreign income rate due to the natural delay, which starts by the increase of producer price indexes and lastly reflects on the consumer.

All of these relations will be seen in a more clear way during the visualization of the time series data.

summary(data_ts)
##      Index          Income       Accommodation_Index Dollar_Exchange_Rate
##  Min.   :2016   Min.   : 286.0   Min.   :297.3       Min.   : 2.835      
##  1st Qu.:2017   1st Qu.: 867.2   1st Qu.:311.7       1st Qu.: 3.660      
##  Median :2019   Median :1240.0   Median :391.0       Median : 5.533      
##  Mean   :2019   Mean   :1512.9   Mean   :399.6       Mean   : 5.499      
##  3rd Qu.:2020   3rd Qu.:2166.5   3rd Qu.:458.4       3rd Qu.: 6.876      
##  Max.   :2022   Max.   :3864.0   Max.   :657.0       Max.   :13.528
head(data_ts)
##          Income Accommodation_Index Dollar_Exchange_Rate
## Jan 2016    825              304.03             3.006950
## Feb 2016    699              303.23             2.940662
## Mar 2016    904              303.37             2.891739
## Apr 2016    844              303.54             2.834738
## May 2016   1233              302.88             2.926595
## Jun 2016   1241              297.31             2.916986
tail(data_ts)
##          Income Accommodation_Index Dollar_Exchange_Rate
## Jul 2021   2201              531.38             8.612941
## Aug 2021   3077              548.97             8.475714
## Sep 2021   2483              574.76             8.511882
## Oct 2021   2808              608.43             9.139945
## Nov 2021   1477              623.24            10.523264
## Dec 2021   1089              656.99            13.528496

2.3 Visualization of Data

2.3.1 Line Plots for Three Dataset

The visulaization process starts by printing the line plots of the datasets to have a general idea about their behaviour for different points in time.

#we will all datasets line plots to have a general idea about their behaviour
par(mfrow=c(1,3))
ggplot(data_ts,aes(x=Index))+
  geom_line(size=1,color="brown",aes(y=Income))+
  theme_ipsum()+ggtitle("1. Time Series of Foreign Tourism Income")

It is seen that the highest level tourism data has reached is in Summer 2019. After the devastating effects of pandemic on tourism sector with the travel restrictions, lockdowns, social distancing rules, it took some time to tourism sector to meet with the old levels it used to hit.

It is also seen that Foreign Tourist Income shows a seasonal pattern, which is likely to increase in May-September period. Summer tourism has a share of approximately %70 when compared to whole tourism income, providing a basis for the behaviour of Income line. (TUIK)

ggplot(data_ts,aes(x=Index))+
  geom_line(size=1,color="red",aes(y=Accommodation_Index))+
  theme_ipsum()+ggtitle("2. Time Series of Accommodation Price Index")

ggplot(data_ts,aes(x=Index))+
  geom_line(size=1,color="brown",aes(y=Dollar_Exchange_Rate))+
  theme_ipsum()+ggtitle("3. Time Series of Exchange Rate USD/TRY")

Price indexes and dollar exchange rate has an increasing trend, which is not affected significantly by the pandemic. It can be explained by the fact that “whenever the world economy seems riskier, investors gravitate toward greenbacks.”[3] Also the harsh interruption of production and service sector which lasted for 3 months minimum has a remarkable effect in the devaluation of Turkish lira.

indexes<-c("Income","Accomodation", "Dollar" )
plot(zoo(data_ts), main="Line Plot for 3 Datasets ", xlab="Date (Monthly)",ylab=indexes)

All three datas have an increasing trend in 2016-2020. Peak points of Accomodation Price Index and Exchange Rate occured at the same date (i.e. 2018 Summer), which may be a sign of causality.When we compare the peak points of tourism income, an increasing trend is also observable.

2.3.2 Histograms for Three Dataset

Histograms are useful tools to summarize a dataset within the interval scales in the form of a vertical bar chart. The distributions of the data could be seen below.

plot_num(data[,-1], bins=10)

Data is mostly scattered on the interval, clustered around the lower values. It can be concluded that the data mostly behaves as a normal distribution with some outliers, which are the possible outcomes of pandemic and unstable economical environment of Turkey. The distributions of each attribute is shown below:

ggplot(data_ts, aes(x=Income)) +
  geom_histogram(aes(y=..density..), colour="navyblue", fill="lightskyblue", bins = 30)+ 
  geom_density(alpha=.2, fill="pink", colour="red")

ggplot(data_ts, aes(x=Accommodation_Index)) +
  geom_histogram(aes(y=..density..), colour="navyblue", fill="lightskyblue", bins = 30)+ 
  geom_density(alpha=.2, fill="pink", colour="red")

ggplot(data_ts, aes(x=Dollar_Exchange_Rate)) +
  geom_histogram(aes(y=..density..), colour="navyblue", fill="lightskyblue", bins = 30)+ 
  geom_density(alpha=.2, fill="pink", colour="red")

Two peaks are seen when the distribution of Accommodation Index and Dollar Exchange is examined. Income is not distributed in this manner, due to the fact that there exists many more factors affecting Income such as season, political relations.

It would be also useful to analyze Income data grouped by month, since it shows a seasonal pattern and July-September period which is called as third quarter, is the most profitable period of the year.

ggplot(data_ts, aes(x=Income)) +
  geom_histogram(aes(y=..density..), colour="navyblue", fill="lightskyblue", bins = 15)+ 
  geom_density(alpha=.2, fill="pink", colour="red")+
  facet_wrap(~month(Index), ncol=3)+
  labs(title = "Monthly Histograms of Foreign Tourist Income 2016-2021", 
       x = "Total Income (million USD)",
       y = "Density")

Especially for the tourism season of Turkey, data shows a normal distribution wıth some extreme points. High values are not seen for winter and fall.

2.3.3 Analysis with Normalized Values

The scale of compared data sets can differ from each other, and in this study, million USD, rate and index is examined. When reflected on a line plot without any manipulation, the trend in rate and index stands so small compared to income. Therefore, dataset is normalized by using min-max normalization, which equates the smallest value to 0 and the largest to 1. The line plot figure is shown below:

min_max_norm <- function(x) {
  (x - min(x)) / (max(x) - min(x))
}
normalized_data_ts <- data_ts
normalized_data_ts$Accommodation_Index <- min_max_norm(data_ts$Accommodation_Index)
normalized_data_ts$Income <- min_max_norm(data_ts$Income)
normalized_data_ts$Dollar_Exchange_Rate <- min_max_norm(data_ts$Dollar_Exchange_Rate)



#we will plot all datas in a single frame to see their behaviour based on their normalized values
ggplot(normalized_data_ts)+geom_line(aes(x=Index, y=Income, color="Foreign Tourist Income"))+
  geom_line(aes(x=Index, y=Accommodation_Index, color="Accommodation Price Index" ))+
  geom_line(aes(x=Index, y=Dollar_Exchange_Rate, color="Dollar Exchange Rate" ))+
  ggtitle("Comparision of Normalized Values")

The Foreign Tourist Income shows more fluctation due to seasonal pattern. The rate of increase in Price Index got larger for the past 2 years, due to rapid depreciation of TL and high inflation. Even the pandemic got into the form of epidemic, Summer 2021 Income is significantly small than the pre-pandemic days.

To get red of the seasonal effect, the trends of the attributes will be compared. Since this project is mainly based on visualization rather than fitting a model, the trendlines of each column are decomposed and visualized. As seen below, the three attributes shows a similar increase pattern until 2020. In 2020, Income shows a sharp fall.

new.ts <- ts(normalized_data_ts, freq = 12)
Trend_Comparison <- decompose(new.ts)$trend
plot(Trend_Comparison)

2.4 Correlation Analyses

In addition to the visual inspection of the time series, calculating correlation is an important step to understand how tight the relationship is between the attributes. As the table shows, Price Index and Exchange Rate has a high correlation value, which is almost 1. They demonstrate a perfect positive correlation type of a behvaiour and the regarding correlation value is ranked as highly significant with stars.

ggpairs(data[,c(2,3,4)])

ggcorr(data[,c(2,3,4)]) 

It is also useful to compute autocorrelation function when there exists seasonal data, until lag equals to 12. Small lags have a high correlation value, which is a sign of increasing trend in the data set. Also the ACF function is cyclical and that is a typical behavior of seasonal data.

acf(data_ts$Income, lag=12, main = "Foreign Visitors Travel Incomes Seasonal Analysis")

Since similar increase in trend was seen in visualization section, a further analysis will be conducted on Foreign Visitor Travel Income dataset. The negative effects of pandemic hit the tourism season harder than most of other datasets. In the figure below, the months after pandemic is removed from all datasets, to see the correlation in a more stable enviroment.

without_covid_data <- data[-(51:72),]
ggpairs(without_covid_data[,c(2,3,4)])

As observed in the figures above, the correlation value between Income and other variables are significantly increased, which is parallel to the pre-study research conducted and proves the relationship between dollar rate and tourism income. As the exchange rate gets higher, it becomes more cheaper for a foreign traveler to spend their holiday in Turkey and demand increases, which eventually results in the increase of income. However, as stated before, this is not the only factor affecting the tourism income.

3 Conclusion

In this study, the correlation between Foreign Traveler Income, Dollar Exchange Rate, and Accomodation Price Index are examined. Due to visual inspection of trends, there seems a similar trend of increase in the time series. As correlation test proves, the strong relation between accomodation index and exchange rate is positive and extremely high. Seasonality and pandemic restrictions have a huge effect on Tourism Income, but when the post-pandemic months are excluded from the data, a higher correlation value is achieved with the Exchange Rate.
In the last step of the study, Google Trends data is included and the relations are visually inspected and a new correlation explored even off-season. The analyses conducted in this study is parallel to the references given in the introduction section, keeping mind that correlation does not mean causality all the time.